Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add script for training #928

Merged
merged 2 commits into from
Mar 31, 2021
Merged

Conversation

zhangting2020
Copy link
Contributor

fp32 training

  • CUDA_VISIBLE_DEVICES=0 bash script/run_fp32.sh
  • log:
2021-03-30 06:55:57 [INFO]	[TRAIN] epoch: 1, iter: 20/500, loss: 2.0859, lr: 0.009657, batch_cost: 0.5344, reader_cost: 0.07273, ips: 3.7425 samples/sec | ETA 00:04:16
2021-03-30 06:56:04 [INFO]	[TRAIN] epoch: 1, iter: 40/500, loss: 1.8769, lr: 0.009295, batch_cost: 0.3483, reader_cost: 0.00205, ips: 5.7421 samples/sec | ETA 00:02:40
2021-03-30 06:56:11 [INFO]	[TRAIN] epoch: 1, iter: 60/500, loss: 1.7455, lr: 0.008931, batch_cost: 0.3532, reader_cost: 0.00204, ips: 5.6622 samples/sec | ETA 00:02:35
2021-03-30 06:56:18 [INFO]	[TRAIN] epoch: 1, iter: 80/500, loss: 1.7786, lr: 0.008566, batch_cost: 0.3501, reader_cost: 0.00193, ips: 5.7133 samples/sec | ETA 00:02:27
2021-03-30 06:56:25 [INFO]	[TRAIN] epoch: 1, iter: 100/500, loss: 1.7679, lr: 0.008199, batch_cost: 0.3523, reader_cost: 0.00153, ips: 5.6774 samples/sec | ETA 00:02:20
2021-03-30 06:56:33 [INFO]	[TRAIN] epoch: 1, iter: 120/500, loss: 1.7884, lr: 0.007830, batch_cost: 0.3754, reader_cost: 0.00179, ips: 5.3281 samples/sec | ETA 00:02:22
2021-03-30 06:56:40 [INFO]	[TRAIN] epoch: 1, iter: 140/500, loss: 1.4884, lr: 0.007459, batch_cost: 0.3723, reader_cost: 0.00111, ips: 5.3722 samples/sec | ETA 00:02:14
2021-03-30 06:56:47 [INFO]	[TRAIN] epoch: 1, iter: 160/500, loss: 1.8324, lr: 0.007086, batch_cost: 0.3515, reader_cost: 0.00187, ips: 5.6902 samples/sec | ETA 00:01:59
2021-03-30 06:56:54 [INFO]	[TRAIN] epoch: 1, iter: 180/500, loss: 1.9415, lr: 0.006711, batch_cost: 0.3519, reader_cost: 0.00200, ips: 5.6841 samples/sec | ETA 00:01:52
2021-03-30 06:57:01 [INFO]	[TRAIN] epoch: 1, iter: 200/500, loss: 1.9368, lr: 0.006333, batch_cost: 0.3551, reader_cost: 0.00222, ips: 5.6317 samples/sec | ETA 00:01:46
2021-03-30 06:57:08 [INFO]	[TRAIN] epoch: 1, iter: 220/500, loss: 1.8381, lr: 0.005953, batch_cost: 0.3518, reader_cost: 0.00172, ips: 5.6844 samples/sec | ETA 00:01:38
2021-03-30 06:57:15 [INFO]	[TRAIN] epoch: 1, iter: 240/500, loss: 1.3836, lr: 0.005571, batch_cost: 0.3534, reader_cost: 0.00135, ips: 5.6593 samples/sec | ETA 00:01:31
2021-03-30 06:57:22 [INFO]	[TRAIN] epoch: 1, iter: 260/500, loss: 1.7486, lr: 0.005185, batch_cost: 0.3539, reader_cost: 0.00142, ips: 5.6513 samples/sec | ETA 00:01:24
2021-03-30 06:57:30 [INFO]	[TRAIN] epoch: 1, iter: 280/500, loss: 1.4495, lr: 0.004796, batch_cost: 0.3524, reader_cost: 0.00084, ips: 5.6755 samples/sec | ETA 00:01:17
2021-03-30 06:57:37 [INFO]	[TRAIN] epoch: 1, iter: 300/500, loss: 1.3292, lr: 0.004404, batch_cost: 0.3523, reader_cost: 0.00106, ips: 5.6762 samples/sec | ETA 00:01:10
2021-03-30 06:57:44 [INFO]	[TRAIN] epoch: 1, iter: 320/500, loss: 1.7622, lr: 0.004007, batch_cost: 0.3515, reader_cost: 0.00167, ips: 5.6892 samples/sec | ETA 00:01:03
2021-03-30 06:57:51 [INFO]	[TRAIN] epoch: 1, iter: 340/500, loss: 1.7007, lr: 0.003606, batch_cost: 0.3537, reader_cost: 0.00092, ips: 5.6552 samples/sec | ETA 00:00:56
2021-03-30 06:57:58 [INFO]	[TRAIN] epoch: 1, iter: 360/500, loss: 1.3338, lr: 0.003201, batch_cost: 0.3523, reader_cost: 0.00087, ips: 5.6766 samples/sec | ETA 00:00:49
2021-03-30 06:58:05 [INFO]	[TRAIN] epoch: 1, iter: 380/500, loss: 2.0620, lr: 0.002789, batch_cost: 0.3543, reader_cost: 0.00151, ips: 5.6455 samples/sec | ETA 00:00:42
2021-03-30 06:58:12 [INFO]	[TRAIN] epoch: 1, iter: 400/500, loss: 1.8045, lr: 0.002370, batch_cost: 0.3534, reader_cost: 0.00150, ips: 5.6599 samples/sec | ETA 00:00:35
2021-03-30 06:58:19 [INFO]	[TRAIN] epoch: 1, iter: 420/500, loss: 1.7951, lr: 0.001943, batch_cost: 0.3551, reader_cost: 0.00129, ips: 5.6319 samples/sec | ETA 00:00:28
2021-03-30 06:58:26 [INFO]	[TRAIN] epoch: 1, iter: 440/500, loss: 1.1775, lr: 0.001506, batch_cost: 0.3535, reader_cost: 0.00163, ips: 5.6579 samples/sec | ETA 00:00:21
2021-03-30 06:58:33 [INFO]	[TRAIN] epoch: 1, iter: 460/500, loss: 1.2738, lr: 0.001053, batch_cost: 0.3530, reader_cost: 0.00118, ips: 5.6661 samples/sec | ETA 00:00:14
2021-03-30 06:58:40 [INFO]	[TRAIN] epoch: 1, iter: 480/500, loss: 1.6528, lr: 0.000577, batch_cost: 0.3552, reader_cost: 0.00095, ips: 5.6313 samples/sec | ETA 00:00:07
2021-03-30 06:58:47 [INFO]	[TRAIN] epoch: 1, iter: 500/500, loss: 1.8418, lr: 0.000037, batch_cost: 0.3555, reader_cost: 0.00121, ips: 5.6256 samples/sec | ETA 00:00:00

fp16 training

  • CUDA_VISIBLE_DEVICES=0 bash script/run_fp16.sh
  • log:
2021-03-30 06:47:34 [INFO]	[TRAIN] epoch: 1, iter: 20/500, loss: 2.1627, lr: 0.009657, batch_cost: 3.7035, reader_cost: 0.10451, ips: 1.0800 samples/sec | ETA 00:29:37
2021-03-30 06:47:39 [INFO]	[TRAIN] epoch: 1, iter: 40/500, loss: 1.5413, lr: 0.009295, batch_cost: 0.2542, reader_cost: 0.00146, ips: 15.7371 samples/sec | ETA 00:01:56
2021-03-30 06:47:44 [INFO]	[TRAIN] epoch: 1, iter: 60/500, loss: 1.8485, lr: 0.008931, batch_cost: 0.2501, reader_cost: 0.00264, ips: 15.9925 samples/sec | ETA 00:01:50
2021-03-30 06:47:49 [INFO]	[TRAIN] epoch: 1, iter: 80/500, loss: 1.2860, lr: 0.008566, batch_cost: 0.2545, reader_cost: 0.00218, ips: 15.7167 samples/sec | ETA 00:01:46
2021-03-30 06:47:54 [INFO]	[TRAIN] epoch: 1, iter: 100/500, loss: 1.7187, lr: 0.008199, batch_cost: 0.2495, reader_cost: 0.00119, ips: 16.0289 samples/sec | ETA 00:01:39
2021-03-30 06:47:59 [INFO]	[TRAIN] epoch: 1, iter: 120/500, loss: 1.5156, lr: 0.007830, batch_cost: 0.2468, reader_cost: 0.00245, ips: 16.2096 samples/sec | ETA 00:01:33
2021-03-30 06:48:04 [INFO]	[TRAIN] epoch: 1, iter: 140/500, loss: 1.7103, lr: 0.007459, batch_cost: 0.2528, reader_cost: 0.00272, ips: 15.8254 samples/sec | ETA 00:01:30
2021-03-30 06:48:09 [INFO]	[TRAIN] epoch: 1, iter: 160/500, loss: 1.6280, lr: 0.007086, batch_cost: 0.2477, reader_cost: 0.00161, ips: 16.1488 samples/sec | ETA 00:01:24
2021-03-30 06:48:14 [INFO]	[TRAIN] epoch: 1, iter: 180/500, loss: 1.2007, lr: 0.006711, batch_cost: 0.2534, reader_cost: 0.00231, ips: 15.7883 samples/sec | ETA 00:01:21
2021-03-30 06:48:19 [INFO]	[TRAIN] epoch: 1, iter: 200/500, loss: 1.3577, lr: 0.006333, batch_cost: 0.2489, reader_cost: 0.00196, ips: 16.0723 samples/sec | ETA 00:01:14
2021-03-30 06:48:24 [INFO]	[TRAIN] epoch: 1, iter: 220/500, loss: 1.4601, lr: 0.005953, batch_cost: 0.2541, reader_cost: 0.00311, ips: 15.7389 samples/sec | ETA 00:01:11
2021-03-30 06:48:29 [INFO]	[TRAIN] epoch: 1, iter: 240/500, loss: 1.2289, lr: 0.005571, batch_cost: 0.2484, reader_cost: 0.00206, ips: 16.1050 samples/sec | ETA 00:01:04
2021-03-30 06:48:34 [INFO]	[TRAIN] epoch: 1, iter: 260/500, loss: 1.3100, lr: 0.005185, batch_cost: 0.2522, reader_cost: 0.00236, ips: 15.8584 samples/sec | ETA 00:01:00
2021-03-30 06:48:39 [INFO]	[TRAIN] epoch: 1, iter: 280/500, loss: 1.2623, lr: 0.004796, batch_cost: 0.2498, reader_cost: 0.00137, ips: 16.0097 samples/sec | ETA 00:00:54
2021-03-30 06:48:44 [INFO]	[TRAIN] epoch: 1, iter: 300/500, loss: 1.3036, lr: 0.004404, batch_cost: 0.2451, reader_cost: 0.00100, ips: 16.3213 samples/sec | ETA 00:00:49
2021-03-30 06:48:49 [INFO]	[TRAIN] epoch: 1, iter: 320/500, loss: 1.3872, lr: 0.004007, batch_cost: 0.2497, reader_cost: 0.00230, ips: 16.0179 samples/sec | ETA 00:00:44
2021-03-30 06:48:54 [INFO]	[TRAIN] epoch: 1, iter: 340/500, loss: 1.3194, lr: 0.003606, batch_cost: 0.2483, reader_cost: 0.00101, ips: 16.1099 samples/sec | ETA 00:00:39
2021-03-30 06:48:59 [INFO]	[TRAIN] epoch: 1, iter: 360/500, loss: 1.2632, lr: 0.003201, batch_cost: 0.2444, reader_cost: 0.00031, ips: 16.3683 samples/sec | ETA 00:00:34
2021-03-30 06:49:04 [INFO]	[TRAIN] epoch: 1, iter: 380/500, loss: 1.1085, lr: 0.002789, batch_cost: 0.2464, reader_cost: 0.00062, ips: 16.2314 samples/sec | ETA 00:00:29
2021-03-30 06:49:09 [INFO]	[TRAIN] epoch: 1, iter: 400/500, loss: 1.2770, lr: 0.002370, batch_cost: 0.2543, reader_cost: 0.00228, ips: 15.7290 samples/sec | ETA 00:00:25
2021-03-30 06:49:14 [INFO]	[TRAIN] epoch: 1, iter: 420/500, loss: 1.2780, lr: 0.001943, batch_cost: 0.2490, reader_cost: 0.00076, ips: 16.0658 samples/sec | ETA 00:00:19
2021-03-30 06:49:19 [INFO]	[TRAIN] epoch: 1, iter: 440/500, loss: 1.2682, lr: 0.001506, batch_cost: 0.2539, reader_cost: 0.00076, ips: 15.7552 samples/sec | ETA 00:00:15
2021-03-30 06:49:24 [INFO]	[TRAIN] epoch: 1, iter: 460/500, loss: 1.3521, lr: 0.001053, batch_cost: 0.2490, reader_cost: 0.00032, ips: 16.0646 samples/sec | ETA 00:00:09
2021-03-30 06:49:29 [INFO]	[TRAIN] epoch: 1, iter: 480/500, loss: 1.2111, lr: 0.000577, batch_cost: 0.2496, reader_cost: 0.00101, ips: 16.0265 samples/sec | ETA 00:00:04
2021-03-30 06:49:34 [INFO]	[TRAIN] epoch: 1, iter: 500/500, loss: 1.1864, lr: 0.000037, batch_cost: 0.2475, reader_cost: 0.00116, ips: 16.1611 samples/sec | ETA 00:00:00

Copy link
Collaborator

@wuyefeilin wuyefeilin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wuyefeilin wuyefeilin merged commit a7bcc8a into PaddlePaddle:benchmark Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants